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STATEMENT  OF  THE  PROBLEM  STUDIED 


Research  on  this  contract  was  directed  towards  areas  of  mathematics  and  numerical  com¬ 
putation  which  have  applications  to  image/signal  processing.  The  research  can  be  broadly 
classified  into  the  following  areas:  (1)  compressed  sensing,  (2)  sparse  representation  and 
encoding  for  digital  elevation  maps,  (3)  learning  theory,  and  (4)  high  dimensional  approx¬ 
imation. 


SUMMARY  OF  THE  MOST  IMPORTANT  RESULTS 
1.  Encoding  Signals:  Compressed  Sensing 

The  classical  paradigm  for  encoding  signals  is  to  model  signals  as  bandlimited  func¬ 
tions.  This  leads  to  the  Shannon  sampling  theorem  which  says  that  a  signal  can  be 
captured  from  equally  spaced  time  samples  provided  the  sampling  is  done  at  points  at 
most  1/2 A  apart  where  A  is  the  bandwidth.  Many  encoders  and  decoders  are  built  based 
on  this  theory  and  in  many  cases  sample  at  even  faster  rates  (e.g.  the  Sigma-Delta  mod¬ 
ulation  schemes).  The  problem  with  these  encoders  is  that  sensors  cannot  physically 
sample  broadbanded  signals  at  the  necessary  rate.  On  the  other  hand,  most  signals  we 
are  attempting  to  capture  have  much  less  information  content  than  a  general  bandlimited 
signal.  The  question  then  arises  whether  we  can  build  encoders  that  sample  closer  to  the 
information  rate  of  the  signal  than  at  their  Nyquist  rate. 

The  new  field  of  compressed  sensing  is  indeed  addressing  this  very  point.  It  shows  that 
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when  a  signal  is  sparse  (when  represented  with  respect  to  some  basis)  or  more  generally 
compressible,  then  it  is  sufficient  to  sample  at  the  sparsity  rate.  This  held  is  only  now 
emerging  and  there  are  many  analytic  questions  centering  around  how  to  do  this  sampling 
how  to  do  the  decoding,  and  what  is  the  provable  performance  of  such  systems. 

Our  contributions  to  compressed  sensing  centered  on  developing  precise  measurements 
of  performance  of  sensing  systems  and  then  finding  which  systems  are  optimal  in  perfor¬ 
mance.  In  [ODD]  we  have  proven  the  best  bounds  for  the  performance  of  compressed 
sensing  systems  by  comparing  the  performance  with  best  k  term  approximation.  We 
describe  precisely  when  such  a  system  can  perform  comparable  to  best  k  term  approxi¬ 
mation.  This  is  determined  by  the  number  of  samples  (it  must  be  slightly  larger  than  k) 
and  the  types  of  samples  (the  compressed  sensing  matrix  should  satisfy  a  Restricted  Isom¬ 
etry  Property  (RIP)).  We  also  define  a  concept  of  best  performance  (instance  optimality) 
in  probability. 

It  is  now  well  understood  that  the  optimal  matrices  for  compressed  sensing  are  given 
by  random  processes  such  as  Gaussian  or  Bernouli.  It  has  been  an  outstanding  question 
as  to  which  decoders  perform  optimally  when  used  in  conjunction  with  random  matrices. 
In  [DPW1]  we  prove  that  i\  minimization  is  an  optimal  decoder  for  very  general  (sub- 
gaussian)  random  matrices.  In  [CDD1],  we  show  that  greedy  algorithms  provide  almost 
optimal  decoders  for  random  matrices. 

While  random  matrices  are  optimal  for  compressed  sensing,  they  do  not  always  merge 
well  with  applications  where  randomness  may  not  be  implementable.  Therefore,  there  is 
great  interest  in  deterministic  constructions  of  compressed  sensing  matrices.  In  [D],  we 
use  number  theory  to  construct  what  are  the  best  performing  deterministic  systems  that 
are  known. 

In  another  work,  [BDDW],  we  have  given  a  simple  proof  that  certain  random  processes 
generate  matrices  satisfying  RIP.  This  gives  the  most  accessible  verifications  of  the  RIP 
for  classical  random  matrices  such  as  Gaussian  or  Bernouli  ensembles. 

2.  Digital  Elevation  Maps 

One  of  the  focal  applications  of  this  research  is  the  compression  of  Digital  Elevation 
Maps  (DEMs).  DEMs  are  usually  rendered  as  3-D  surfaces  and  image  processing  tech¬ 
niques  are  not  appropriate  for  processing  these  maps.  We  have  stressed  the  importance  of 
developing  data  compression  in  the  framework  of  new  metrics  (such  as  the  Hausdorff  met¬ 
ric)  which  incorporate  the  geometry  in  DEMs  and  are  also  more  pointed  to  their  intended 
applications. 

Our  research  has  been  directed  at  two  fronts.  The  first  is  to  determine  the  Kolmogorov 
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entropy  in  the  Hausdorff  metric  for  various  model  classes  for  DEMs.  This  has  led  to 
many  interesting  results  [CDDD]  for  classes  such  as  BV  or  piecewise  smooth  functions. 
To  complete  this  direction,  we  want  to  incorporate  more  geometry  into  the  model  classes 
since  we  feel  this  captures  the  spirit  of  DEMs. 

The  second  front  of  our  research  in  DEM  compression  is  directed  at  the  development 
of  algorithms  and  encoders  for  DEMs.  Some  of  the  desired  features  of  the  algorithms 
under  development  are:  (a)  high  compression,  (b)  robust  error  handling,  (c)  progressive 
transmission  of  the  data,  (d)  quick  rendering,  and  (e)  burning  in  (tunnelling)  and  line 
of  sight  display.  Since  almost  all  graphic  hardware  uses  triangular  polygonal  patches 
as  building  blocks  for  object  description,  we  focus  our  attention  to  algorithms  utilizing 
meshes  of  polygonal  elements.  We  have  investigated  several  types  of  algorithms  and 
encoders: 

•  Nonlinear  approximation  algorithms  based  on  adaptive  multiresolution  analysis; 

•  Greedy  (insertion  or  removal)  algorithms  for  mesh  construction  which  utilize  De¬ 
launay  triangulation; 

•  Progressive  encoding  based  on  level  sets. 

The  first  algorithms  include:  (a)  initial  coarse  adaptive  triangulation  which  allows  a 
low  resolution  good  approximation,  (b)  wavelet  decomposition  of  the  function  for  achiev¬ 
ing  sparse  representation  of  the  function  (surface),  (c)  conversion  to  hierarchical  B-spline 
representation  and  application  of  the  nonlinear  uniform  approximation  scheme  from  [D  JL] 
and  [DPY],  and  (d)  compression  and  progressive  transmission  of  the  data  using  the  hier¬ 
archical  representation. 

The  greedy  removal  algorithm  is  a  recursive  procedure  with  the  following  basic  ele¬ 
ments:  (a)  determination  and  updating  of  the  significance  table  for  the  grid  points,  (b) 
removal  one  by  one  of  the  least  significant  points,  and  (c)  mesh  updating  after  each  re¬ 
moval  with  Delaunay  triangulation  algorithm.  The  greedy  insertion  algorithm  utilizes  the 
same  elements  but  in  a  reverse  order.  We  pay  special  attention  to  the  data  structure  that 
enables  us  to  compress  and  transmit  the  data  progressively. 

The  level  set  method  seeks  first  to  give  a  progressive  description  of  the  surface  in 
terms  of  level  curves  and  Morse  trees.  We  prioritize  the  level  curves  and  ridge  curves  and 
then  encode  each  of  them  in  a  progressive  manner.  The  remainder  of  the  surface  is  then 
extrapolated  from  this  information  by  blending  or  interpolation. 

The  level  curves  are  compressed  using  a  multiscale  decomposition  as  described  in 
[BDDD],  This  paper  also  proves  various  theorems  which  prove  the  efficiency  of  this  method 
of  representing  and  encoding  curves. 
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A  major  question  that  we  are  studying  in  detail  is  to  understand  which  surfaces  can 
be  compressed  well  using  level  set  methods.  In  this  direction,  we  have  introduced  new 
anisotropic  spaces  of  functions  in  [DPW]  and  shown  that  surfaces  which  are  graphs  of 
these  functions  can  be  compressed  well  with  level  set  methods.  These  new  anisotropic 
spaces  are  completely  different  from  the  anisotropic  spaces  usually  studied  in  harmonic 
analysis  and  PDEs.  For  example  functions  with  large  gradients  are  in  a  certain  sense  nice 
functions  with  respect  to  this  family  of  spaces.  The  correct  description  of  these  spaces 
when  measuring  higher  order  smoothness  is  still  to  be  completely  worked  out.  We  believe 
that  these  spaces  will  play  an  important  role  in  analysis,  not  only  for  surface  compression 
but  also  for  the  study  of  nonlinear  evolution  equations. 

The  theory  and  algorithms  behind  our  methods  for  surface  compression  are  developed 
in  [BDHJKLS] 

3.  Learning  Theory 

A  typical  application  of  surface  processing  is  to  generate  a  faithful  representation  of 
noisy  point  cloud  data  associated  to  a  given  surface.  This  can  be  viewed  as  a  regres¬ 
sion  problem  in  learning  theory  where  the  unknown  underlying  probability  distribution 
corresponds  to  the  noisy  data.  The  noise  arises  from  sensor  noise,  sensor  jitter,  error  in 
global  positioning,  misclassification  of  points  on  the  surface,  etc.  We  have  developed  in 
[DKPT]  a  general  theory  which  describes  when  learning  algorithms  are  optimal  and  gives 
the  theoretical  framework  for  creating  optimal  algorithms.  In  [BCDDT,  BCDD,  BCDD1] 
we  have  developed  an  adaptive  algorithm  (an  alternative  to  model  selection)  which  is 
shown  to  be  optimal  (in  a  certain  sense)  for  learning  the  regression  function  from  a  given 
data  set.  This  technology  has  been  applied  to  learning  surfaces  generated  in  real  time  in 
the  autonomous  navigation  of  Micro  Air  Vehicles  (MAVs)  (see  [KNPDBDS]). 

A  major  question  in  the  development  of  learning  algorithms  is  the  computational  speed 
and  whether  they  can  handle  streaming  data.  This  is  especially  true  in  high  dimensions 
where  the  curse  of  dimensionality  can  have  a  debilitating  effect.  Directed  at  this  problem 
we  have  constructed  and  analyzed  greedy  algorithms  for  learning  in  [BaCDD]  which  are 
provably  optimal  in  performance  and  computational  speed. 

A  second  area  of  learning  theory  that  is  important  in  many  applications  of  signal 
and  image  processing  is  classification.  We  have  developed  a  new  mathematical  theory 
for  binary  classification  using  reliable  set  in  [CDDS].  The  algorithms  build  a  classifier 
from  training  sets  by  using  set  partitioning.  We  give  bounds  in  probability  on  the  perfor¬ 
mance  of  the  classifier  as  compared  to  the  Bayes  classifier.  We  are  now  building  practical 
classifiers  based  on  sparse  tree  approximation  and  other  recursive  partitioning  algorithms. 
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4.  Beating  the  Curse  of  Dimension 

One  of  the  drawbacks  of  adaptive  methods  in  learning  and  other  application  domains 
is  that  they  are  computationally  expensive  for  high  dimensional  problems.  For  example  if 
the  Euclidean  space  dimension  is  d  then  partitioning  just  one  cell  into  its  children  results 
in  2d  cells.  So  this  is  impossible  to  implement  when  d  is  larger  than  20  or  so.  The 
usual  method  for  circumventing  this  difficulty  is  to  use  kernel  methods  such  as  the  Mercer 
kernels  or  support  vector  machines.  We  find  these  unsatisfactory  on  many  problems  since 
the  representations  are  not  local.  We  have  tried  to  develop  alternative  methods  to  retain 
localness  of  the  representation  and  to  still  treat  high  dimensions.  Our  results  are  in  two 
directions:  greedy  algorithms  and  sparse  tree  approximation. 

In  greedy  algorithms  one  seeks  a  representation  of  a  function  (signal/image  etc.)  as 
a  linear  combination  of  a  few  elements  from  a  redundant  family  (called  a  dictionary) 
of  waveforms.  There  is  a  long  history  to  such  algorithms.  Our  recent  work  [BaCDD] 
has  identified  the  performance  of  such  algorithms  and  showed  how  our  analysis  can  be 
applied  to  learning  problems  to  significantly  cut  down  on  the  computational  complexity 
of  generating  an  approximation  to  the  regression  function. 

The  rough  idea  behind  sparse  trees  is  to  only  look  at  children  in  an  adaptive  partition 
that  have  data  points.  When  the  ambient  space  dimension  is  large  there  are  only  a  few  cells 
(determined  by  the  size  of  the  data  set)  which  contain  data  points.  We  are  developing 
[BDDL]  a  theory  for  the  performance  of  sparse  tree  approximation  and  commensurate 
algorithms  for  their  implementation.  We  are  applying  this  technology  to  problems  in 
meteorology  together  with  scientists  at  the  University  of  Maryland.  In  this  application 
to  long  term  weather  forecasting  the  ambient  space  dimension  is  d  >  200. 
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